Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 4362 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 477.2 KiB |
| Average record size in memory | 112.0 B |
Variable types
| Numeric | 14 |
|---|
recency_p is highly correlated with avg_recency_days | High correlation |
quantity_p is highly correlated with gross_revenue and 2 other fields | High correlation |
avg_recency_days is highly correlated with recency_p | High correlation |
gross_revenue is highly correlated with quantity_p and 3 other fields | High correlation |
relative_revenue is highly correlated with quantity_p and 3 other fields | High correlation |
relative_quantity is highly correlated with quantity_p and 2 other fields | High correlation |
relative_invoices is highly correlated with gross_revenue and 1 other fields | High correlation |
recency_p is highly correlated with avg_recency_days and 1 other fields | High correlation |
quantity_p is highly correlated with avg_basket_size and 4 other fields | High correlation |
avg_ticket is highly correlated with avg_variety | High correlation |
avg_recency_days is highly correlated with recency_p | High correlation |
avg_basket_size is highly correlated with quantity_p and 3 other fields | High correlation |
avg_variety is highly correlated with avg_ticket | High correlation |
gross_revenue is highly correlated with quantity_p and 4 other fields | High correlation |
relative_revenue is highly correlated with quantity_p and 4 other fields | High correlation |
relative_quantity is highly correlated with quantity_p and 4 other fields | High correlation |
relative_invoices is highly correlated with recency_p and 4 other fields | High correlation |
recency_p is highly correlated with avg_recency_days | High correlation |
quantity_p is highly correlated with avg_basket_size and 3 other fields | High correlation |
avg_recency_days is highly correlated with recency_p | High correlation |
avg_basket_size is highly correlated with quantity_p and 1 other fields | High correlation |
gross_revenue is highly correlated with quantity_p and 3 other fields | High correlation |
relative_revenue is highly correlated with quantity_p and 3 other fields | High correlation |
relative_quantity is highly correlated with quantity_p and 3 other fields | High correlation |
relative_invoices is highly correlated with gross_revenue and 1 other fields | High correlation |
avg_recency_days is highly correlated with df_index and 1 other fields | High correlation |
quantity_d is highly correlated with quantity_p and 1 other fields | High correlation |
quantity_p is highly correlated with quantity_d and 5 other fields | High correlation |
df_index is highly correlated with avg_recency_days and 1 other fields | High correlation |
relative_invoices is highly correlated with quantity_p and 3 other fields | High correlation |
recency_p is highly correlated with avg_recency_days and 1 other fields | High correlation |
gross_revenue is highly correlated with quantity_p and 4 other fields | High correlation |
avg_ticket is highly correlated with avg_basket_size | High correlation |
avg_basket_size is highly correlated with quantity_p and 4 other fields | High correlation |
relative_revenue is highly correlated with quantity_p and 4 other fields | High correlation |
relative_quantity is highly correlated with quantity_d and 5 other fields | High correlation |
quantity_d is highly skewed (γ1 = 32.25098083) | Skewed |
gross_revenue is highly skewed (γ1 = 21.72254957) | Skewed |
relative_revenue is highly skewed (γ1 = 21.72254957) | Skewed |
relative_quantity is highly skewed (γ1 = 20.26712382) | Skewed |
df_index has unique values | Unique |
customer_id has unique values | Unique |
quantity_d has 2778 (63.7%) zeros | Zeros |
relative_invoices has 257 (5.9%) zeros | Zeros |
Reproduction
| Analysis started | 2021-06-14 23:30:31.822739 |
|---|---|
| Analysis finished | 2021-06-14 23:31:01.285519 |
| Duration | 29.46 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 4362 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2837.824622 |
| Minimum | 0 |
|---|---|
| Maximum | 5970 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 229.05 |
| Q1 | 1303.25 |
| median | 2733.5 |
| Q3 | 4424.75 |
| 95-th percentile | 5635.95 |
| Maximum | 5970 |
| Range | 5970 |
| Interquartile range (IQR) | 3121.5 |
Descriptive statistics
| Standard deviation | 1758.838539 |
|---|---|
| Coefficient of variation (CV) | 0.6197840859 |
| Kurtosis | -1.239013548 |
| Mean | 2837.824622 |
| Median Absolute Deviation (MAD) | 1551 |
| Skewness | 0.1031975413 |
| Sum | 12378591 |
| Variance | 3093513.007 |
| Monotonicity | Strictly increasing |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 5348 | 1 | < 0.1% |
| 3311 | 1 | < 0.1% |
| 5356 | 1 | < 0.1% |
| 3307 | 1 | < 0.1% |
| 1258 | 1 | < 0.1% |
| 3303 | 1 | < 0.1% |
| 1254 | 1 | < 0.1% |
| 1250 | 1 | < 0.1% |
| 1426 | 1 | < 0.1% |
| Other values (4352) | 4352 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 5970 | 1 | |
| 5963 | 1 | |
| 5962 | 1 | |
| 5960 | 1 | |
| 5958 | 1 | |
| 5954 | 1 | |
| 5953 | 1 | |
| 5952 | 1 | |
| 5951 | 1 | |
| 5950 | 1 |
| Distinct | 4362 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15301.81041 |
| Minimum | 12347 |
|---|---|
| Maximum | 18287 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | 12347 |
|---|---|
| 5-th percentile | 12615.05 |
| Q1 | 13814.25 |
| median | 15303.5 |
| Q3 | 16780.75 |
| 95-th percentile | 17984.95 |
| Maximum | 18287 |
| Range | 5940 |
| Interquartile range (IQR) | 2966.5 |
Descriptive statistics
| Standard deviation | 1722.249578 |
|---|---|
| Coefficient of variation (CV) | 0.1125520139 |
| Kurtosis | -1.196671423 |
| Mean | 15301.81041 |
| Median Absolute Deviation (MAD) | 1485 |
| Skewness | 0.0005631275836 |
| Sum | 66746497 |
| Variance | 2966143.609 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 16384 | 1 | < 0.1% |
| 13644 | 1 | < 0.1% |
| 13656 | 1 | < 0.1% |
| 17750 | 1 | < 0.1% |
| 15701 | 1 | < 0.1% |
| 13652 | 1 | < 0.1% |
| 17746 | 1 | < 0.1% |
| 17742 | 1 | < 0.1% |
| 17738 | 1 | < 0.1% |
| 13816 | 1 | < 0.1% |
| Other values (4352) | 4352 |
| Value | Count | Frequency (%) |
| 12347 | 1 | |
| 12348 | 1 | |
| 12349 | 1 | |
| 12350 | 1 | |
| 12352 | 1 | |
| 12353 | 1 | |
| 12354 | 1 | |
| 12355 | 1 | |
| 12356 | 1 | |
| 12357 | 1 |
| Value | Count | Frequency (%) |
| 18287 | 1 | |
| 18283 | 1 | |
| 18282 | 1 | |
| 18281 | 1 | |
| 18280 | 1 | |
| 18278 | 1 | |
| 18277 | 1 | |
| 18276 | 1 | |
| 18274 | 1 | |
| 18273 | 1 |
| Distinct | 304 |
|---|---|
| Distinct (%) | 7.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 93.99174691 |
| Minimum | 0 |
|---|---|
| Maximum | 373 |
| Zeros | 34 |
| Zeros (%) | 0.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 17 |
| median | 51 |
| Q3 | 147 |
| 95-th percentile | 318 |
| Maximum | 373 |
| Range | 373 |
| Interquartile range (IQR) | 130 |
Descriptive statistics
| Standard deviation | 102.3910002 |
|---|---|
| Coefficient of variation (CV) | 1.089361604 |
| Kurtosis | 0.3847161405 |
| Mean | 93.99174691 |
| Median Absolute Deviation (MAD) | 41 |
| Skewness | 1.237891749 |
| Sum | 409992 |
| Variance | 10483.91692 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 103 | 2.4% |
| 3 | 94 | 2.2% |
| 4 | 94 | 2.2% |
| 2 | 90 | 2.1% |
| 8 | 79 | 1.8% |
| 10 | 77 | 1.8% |
| 17 | 74 | 1.7% |
| 7 | 72 | 1.7% |
| 9 | 71 | 1.6% |
| 22 | 64 | 1.5% |
| Other values (294) | 3544 |
| Value | Count | Frequency (%) |
| 0 | 34 | 0.8% |
| 1 | 103 | |
| 2 | 90 | |
| 3 | 94 | |
| 4 | 94 | |
| 5 | 48 | |
| 7 | 72 | |
| 8 | 79 | |
| 9 | 71 | |
| 10 | 77 |
| Value | Count | Frequency (%) |
| 373 | 17 | 0.4% |
| 372 | 18 | |
| 371 | 6 | 0.1% |
| 369 | 3 | 0.1% |
| 368 | 5 | 0.1% |
| 367 | 5 | 0.1% |
| 366 | 10 | 0.2% |
| 365 | 43 | |
| 364 | 6 | 0.1% |
| 362 | 6 | 0.1% |
| Distinct | 774 |
|---|---|
| Distinct (%) | 17.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 274.4497937 |
| Minimum | 0 |
|---|---|
| Maximum | 38639 |
| Zeros | 33 |
| Zeros (%) | 0.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 18 |
| Q1 | 60 |
| median | 115 |
| Q3 | 233.75 |
| 95-th percentile | 702 |
| Maximum | 38639 |
| Range | 38639 |
| Interquartile range (IQR) | 173.75 |
Descriptive statistics
| Standard deviation | 1050.93695 |
|---|---|
| Coefficient of variation (CV) | 3.829250283 |
| Kurtosis | 519.6817487 |
| Mean | 274.4497937 |
| Median Absolute Deviation (MAD) | 69 |
| Skewness | 19.0470736 |
| Sum | 1197150 |
| Variance | 1104468.473 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 28 | 42 | 1.0% |
| 60 | 40 | 0.9% |
| 67 | 39 | 0.9% |
| 51 | 37 | 0.8% |
| 72 | 36 | 0.8% |
| 52 | 35 | 0.8% |
| 66 | 34 | 0.8% |
| 0 | 33 | 0.8% |
| 70 | 33 | 0.8% |
| 90 | 32 | 0.7% |
| Other values (764) | 4001 |
| Value | Count | Frequency (%) |
| 0 | 33 | |
| 1 | 11 | 0.3% |
| 2 | 5 | 0.1% |
| 3 | 14 | |
| 4 | 6 | 0.1% |
| 5 | 2 | < 0.1% |
| 6 | 14 | |
| 7 | 3 | 0.1% |
| 8 | 5 | 0.1% |
| 9 | 8 | 0.2% |
| Value | Count | Frequency (%) |
| 38639 | 1 | |
| 21352 | 1 | |
| 17376 | 1 | |
| 17150 | 1 | |
| 16288 | 1 | |
| 15853 | 1 | |
| 13369 | 1 | |
| 12872 | 1 | |
| 10828 | 1 | |
| 10399 | 1 |
| Distinct | 183 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18.09903714 |
| Minimum | -0 |
|---|---|
| Maximum | 9361 |
| Zeros | 2778 |
| Zeros (%) | 63.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | -0 |
|---|---|
| 5-th percentile | -0 |
| Q1 | -0 |
| median | -0 |
| Q3 | 3 |
| 95-th percentile | 40.95 |
| Maximum | 9361 |
| Range | 9361 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 195.7888619 |
|---|---|
| Coefficient of variation (CV) | 10.81763965 |
| Kurtosis | 1332.556806 |
| Mean | 18.09903714 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 32.25098083 |
| Sum | 78948 |
| Variance | 38333.27843 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| -0 | 2778 | |
| 1 | 349 | 8.0% |
| 3 | 173 | 4.0% |
| 6 | 90 | 2.1% |
| 2 | 89 | 2.0% |
| 4 | 75 | 1.7% |
| 5 | 45 | 1.0% |
| 12 | 45 | 1.0% |
| 7 | 42 | 1.0% |
| 8 | 40 | 0.9% |
| Other values (173) | 636 | 14.6% |
| Value | Count | Frequency (%) |
| -0 | 2778 | |
| 1 | 349 | 8.0% |
| 2 | 89 | 2.0% |
| 3 | 173 | 4.0% |
| 4 | 75 | 1.7% |
| 5 | 45 | 1.0% |
| 6 | 90 | 2.1% |
| 7 | 42 | 1.0% |
| 8 | 40 | 0.9% |
| 9 | 37 | 0.8% |
| Value | Count | Frequency (%) |
| 9361 | 1 | |
| 4873 | 1 | |
| 4027 | 1 | |
| 2399 | 1 | |
| 2302 | 1 | |
| 2160 | 1 | |
| 1685 | 1 | |
| 1608 | 1 | |
| 1515 | 1 | |
| 1350 | 1 |
| Distinct | 4300 |
|---|---|
| Distinct (%) | 98.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 33.31553376 |
| Minimum | 0 |
|---|---|
| Maximum | 3861 |
| Zeros | 33 |
| Zeros (%) | 0.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4.355708812 |
| Q1 | 11.98439402 |
| median | 17.66815126 |
| Q3 | 24.736875 |
| 95-th percentile | 90.50791667 |
| Maximum | 3861 |
| Range | 3861 |
| Interquartile range (IQR) | 12.75248098 |
Descriptive statistics
| Standard deviation | 110.1241737 |
|---|---|
| Coefficient of variation (CV) | 3.305490301 |
| Kurtosis | 544.0449692 |
| Mean | 33.31553376 |
| Median Absolute Deviation (MAD) | 6.432807882 |
| Skewness | 19.83606126 |
| Sum | 145322.3583 |
| Variance | 12127.33363 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 33 | 0.8% |
| 15 | 4 | 0.1% |
| 76.32 | 4 | 0.1% |
| 179 | 4 | 0.1% |
| 18.7 | 3 | 0.1% |
| 25.5 | 3 | 0.1% |
| 24.4 | 2 | < 0.1% |
| 20.8 | 2 | < 0.1% |
| 30 | 2 | < 0.1% |
| 358 | 2 | < 0.1% |
| Other values (4290) | 4303 |
| Value | Count | Frequency (%) |
| 0 | 33 | |
| 2.101285714 | 1 | < 0.1% |
| 2.150588235 | 1 | < 0.1% |
| 2.241 | 1 | < 0.1% |
| 2.264375 | 1 | < 0.1% |
| 2.4325 | 1 | < 0.1% |
| 2.462371134 | 1 | < 0.1% |
| 2.504876033 | 1 | < 0.1% |
| 2.50837156 | 1 | < 0.1% |
| 2.54704918 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 3861 | 1 | |
| 3096 | 1 | |
| 2033.1 | 1 | |
| 2027.86 | 1 | |
| 1687.2 | 1 | |
| 1377.077778 | 1 | |
| 1001.2 | 1 | |
| 952.9875 | 1 | |
| 931.5 | 1 | |
| 872.13 | 1 |
avg_recency_days
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 1276 |
|---|---|
| Distinct (%) | 29.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 94.55606587 |
| Minimum | 0 |
|---|---|
| Maximum | 373 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 34.58333333 |
| median | 63.6 |
| Q3 | 122.6666667 |
| 95-th percentile | 298 |
| Maximum | 373 |
| Range | 373 |
| Interquartile range (IQR) | 88.08333333 |
Descriptive statistics
| Standard deviation | 86.50864489 |
|---|---|
| Coefficient of variation (CV) | 0.9148925994 |
| Kurtosis | 1.732227692 |
| Mean | 94.55606587 |
| Median Absolute Deviation (MAD) | 35.66666667 |
| Skewness | 1.543552023 |
| Sum | 412453.5593 |
| Variance | 7483.745641 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 53 | 29 | 0.7% |
| 30 | 24 | 0.6% |
| 18 | 23 | 0.5% |
| 39 | 22 | 0.5% |
| 106 | 22 | 0.5% |
| 24 | 21 | 0.5% |
| 25 | 20 | 0.5% |
| 92 | 20 | 0.5% |
| 28 | 20 | 0.5% |
| 15 | 19 | 0.4% |
| Other values (1266) | 4142 |
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 1 | 4 | |
| 2 | 5 | |
| 2.554794521 | 1 | < 0.1% |
| 3 | 9 | |
| 3.243478261 | 1 | < 0.1% |
| 3.300884956 | 1 | < 0.1% |
| 3.333333333 | 1 | < 0.1% |
| 3.5 | 1 | < 0.1% |
| 3.666666667 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 373 | 15 | |
| 372 | 17 | |
| 371 | 7 | |
| 369 | 3 | 0.1% |
| 368 | 5 | 0.1% |
| 367 | 5 | 0.1% |
| 366 | 8 | |
| 365 | 10 | |
| 364 | 5 | 0.1% |
| 362 | 6 | 0.1% |
| Distinct | 2132 |
|---|---|
| Distinct (%) | 48.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 224.1642092 |
| Minimum | 0 |
|---|---|
| Maximum | 7824 |
| Zeros | 33 |
| Zeros (%) | 0.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 30.50833333 |
| Q1 | 91 |
| median | 160.3333333 |
| Q3 | 270 |
| 95-th percentile | 597.0020833 |
| Maximum | 7824 |
| Range | 7824 |
| Interquartile range (IQR) | 179 |
Descriptive statistics
| Standard deviation | 283.6856467 |
|---|---|
| Coefficient of variation (CV) | 1.265526052 |
| Kurtosis | 157.482962 |
| Mean | 224.1642092 |
| Median Absolute Deviation (MAD) | 81.66666667 |
| Skewness | 8.854115445 |
| Sum | 977804.2806 |
| Variance | 80477.54614 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 33 | 0.8% |
| 100 | 19 | 0.4% |
| 82 | 18 | 0.4% |
| 88 | 17 | 0.4% |
| 120 | 17 | 0.4% |
| 72 | 17 | 0.4% |
| 136 | 17 | 0.4% |
| 73 | 16 | 0.4% |
| 78 | 16 | 0.4% |
| 106 | 16 | 0.4% |
| Other values (2122) | 4176 |
| Value | Count | Frequency (%) |
| 0 | 33 | |
| 1 | 6 | 0.1% |
| 1.5 | 1 | < 0.1% |
| 2 | 5 | 0.1% |
| 3 | 2 | < 0.1% |
| 3.333333333 | 1 | < 0.1% |
| 4 | 8 | 0.2% |
| 5 | 3 | 0.1% |
| 5.333333333 | 1 | < 0.1% |
| 5.666666667 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 7824 | 1 | |
| 4300 | 1 | |
| 4280 | 1 | |
| 3684.47619 | 1 | |
| 3028 | 1 | |
| 2924 | 1 | |
| 2880 | 1 | |
| 2708 | 1 | |
| 2697.465753 | 1 | |
| 2529 | 1 |
| Distinct | 1043 |
|---|---|
| Distinct (%) | 23.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22.03462946 |
| Minimum | 0 |
|---|---|
| Maximum | 300.6470588 |
| Zeros | 33 |
| Zeros (%) | 0.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2.333333333 |
| Q1 | 9.2 |
| median | 16.96774194 |
| Q3 | 28 |
| 95-th percentile | 59 |
| Maximum | 300.6470588 |
| Range | 300.6470588 |
| Interquartile range (IQR) | 18.8 |
Descriptive statistics
| Standard deviation | 20.3113935 |
|---|---|
| Coefficient of variation (CV) | 0.9217941941 |
| Kurtosis | 18.75754511 |
| Mean | 22.03462946 |
| Median Absolute Deviation (MAD) | 8.878411911 |
| Skewness | 3.032532776 |
| Sum | 96115.05369 |
| Variance | 412.552706 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 97 | 2.2% |
| 13 | 90 | 2.1% |
| 14 | 84 | 1.9% |
| 10 | 82 | 1.9% |
| 11 | 81 | 1.9% |
| 9 | 78 | 1.8% |
| 6 | 75 | 1.7% |
| 7 | 74 | 1.7% |
| 5 | 71 | 1.6% |
| 8 | 69 | 1.6% |
| Other values (1033) | 3561 |
| Value | Count | Frequency (%) |
| 0 | 33 | 0.8% |
| 1 | 97 | |
| 1.2 | 1 | < 0.1% |
| 1.25 | 1 | < 0.1% |
| 1.333333333 | 2 | < 0.1% |
| 1.5 | 8 | 0.2% |
| 1.555555556 | 1 | < 0.1% |
| 1.571428571 | 1 | < 0.1% |
| 1.666666667 | 4 | 0.1% |
| 1.833333333 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 300.6470588 | 1 | |
| 219 | 1 | |
| 203.5 | 1 | |
| 191 | 1 | |
| 171 | 1 | |
| 164 | 1 | |
| 158 | 1 | |
| 157 | 1 | |
| 153 | 1 | |
| 149 | 1 |
purchases_pday
Real number (ℝ≥0)
| Distinct | 1243 |
|---|---|
| Distinct (%) | 28.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4000047423 |
| Minimum | 0 |
|---|---|
| Maximum | 17 |
| Zeros | 33 |
| Zeros (%) | 0.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.009586200541 |
| Q1 | 0.01989485241 |
| median | 0.04474578328 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 17 |
| Range | 17 |
| Interquartile range (IQR) | 0.9801051476 |
Descriptive statistics
| Standard deviation | 0.5593853893 |
|---|---|
| Coefficient of variation (CV) | 1.398446894 |
| Kurtosis | 177.5266664 |
| Mean | 0.4000047423 |
| Median Absolute Deviation (MAD) | 0.03322942607 |
| Skewness | 6.699343025 |
| Sum | 1744.820686 |
| Variance | 0.3129120137 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 1501 | |
| 2 | 50 | 1.1% |
| 0 | 33 | 0.8% |
| 0.0625 | 18 | 0.4% |
| 0.02777777778 | 17 | 0.4% |
| 0.02380952381 | 17 | 0.4% |
| 0.09090909091 | 15 | 0.3% |
| 0.08333333333 | 14 | 0.3% |
| 0.02941176471 | 13 | 0.3% |
| 0.07692307692 | 13 | 0.3% |
| Other values (1233) | 2671 |
| Value | Count | Frequency (%) |
| 0 | 33 | |
| 0.005449591281 | 1 | < 0.1% |
| 0.005464480874 | 1 | < 0.1% |
| 0.005479452055 | 1 | < 0.1% |
| 0.005494505495 | 1 | < 0.1% |
| 0.005586592179 | 2 | < 0.1% |
| 0.005602240896 | 1 | < 0.1% |
| 0.005617977528 | 2 | < 0.1% |
| 0.00566572238 | 1 | < 0.1% |
| 0.005681818182 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 17 | 1 | < 0.1% |
| 4 | 2 | < 0.1% |
| 3 | 3 | 0.1% |
| 2 | 50 | 1.1% |
| 1.142857143 | 1 | < 0.1% |
| 1 | 1501 | |
| 0.75 | 1 | < 0.1% |
| 0.6666666667 | 4 | 0.1% |
| 0.5588235294 | 1 | < 0.1% |
| 0.5388739946 | 1 | < 0.1% |
gross_revenue
Real number (ℝ)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWED| Distinct | 4307 |
|---|---|
| Distinct (%) | 98.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1896.819324 |
| Minimum | -4287.63 |
|---|---|
| Maximum | 279489.02 |
| Zeros | 9 |
| Zeros (%) | 0.2% |
| Negative | 42 |
| Negative (%) | 1.0% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | -4287.63 |
|---|---|
| 5-th percentile | 101.343 |
| Q1 | 293.6175 |
| median | 648.55 |
| Q3 | 1612.625 |
| 95-th percentile | 5613.4125 |
| Maximum | 279489.02 |
| Range | 283776.65 |
| Interquartile range (IQR) | 1319.0075 |
Descriptive statistics
| Standard deviation | 8223.13024 |
|---|---|
| Coefficient of variation (CV) | 4.335220617 |
| Kurtosis | 607.4362874 |
| Mean | 1896.819324 |
| Median Absolute Deviation (MAD) | 455.145 |
| Skewness | 21.72254957 |
| Sum | 8273925.89 |
| Variance | 67619870.94 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 9 | 0.2% |
| 76.32 | 4 | 0.1% |
| 15 | 3 | 0.1% |
| 35.4 | 3 | 0.1% |
| 363.65 | 3 | 0.1% |
| 440 | 3 | 0.1% |
| 318.05 | 2 | < 0.1% |
| 331 | 2 | < 0.1% |
| 204 | 2 | < 0.1% |
| 120 | 2 | < 0.1% |
| Other values (4297) | 4329 |
| Value | Count | Frequency (%) |
| -4287.63 | 1 | |
| -1592.49 | 1 | |
| -1192.2 | 1 | |
| -1165.3 | 1 | |
| -1126 | 1 | |
| -840.76 | 1 | |
| -611.86 | 1 | |
| -451.42 | 1 | |
| -295.09 | 1 | |
| -227.44 | 1 |
| Value | Count | Frequency (%) |
| 279489.02 | 1 | |
| 256438.49 | 1 | |
| 187482.17 | 1 | |
| 132572.62 | 1 | |
| 123725.45 | 1 | |
| 113384.14 | 1 | |
| 88125.38 | 1 | |
| 65892.08 | 1 | |
| 62653.1 | 1 | |
| 59419.34 | 1 |
relative_revenue
Real number (ℝ)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWED| Distinct | 4307 |
|---|---|
| Distinct (%) | 98.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1896.819324 |
| Minimum | -4287.63 |
|---|---|
| Maximum | 279489.02 |
| Zeros | 9 |
| Zeros (%) | 0.2% |
| Negative | 42 |
| Negative (%) | 1.0% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | -4287.63 |
|---|---|
| 5-th percentile | 101.343 |
| Q1 | 293.6175 |
| median | 648.55 |
| Q3 | 1612.625 |
| 95-th percentile | 5613.4125 |
| Maximum | 279489.02 |
| Range | 283776.65 |
| Interquartile range (IQR) | 1319.0075 |
Descriptive statistics
| Standard deviation | 8223.13024 |
|---|---|
| Coefficient of variation (CV) | 4.335220617 |
| Kurtosis | 607.4362874 |
| Mean | 1896.819324 |
| Median Absolute Deviation (MAD) | 455.145 |
| Skewness | 21.72254957 |
| Sum | 8273925.89 |
| Variance | 67619870.94 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 9 | 0.2% |
| 76.32 | 4 | 0.1% |
| 15 | 3 | 0.1% |
| 35.4 | 3 | 0.1% |
| 363.65 | 3 | 0.1% |
| 440 | 3 | 0.1% |
| 318.05 | 2 | < 0.1% |
| 331 | 2 | < 0.1% |
| 204 | 2 | < 0.1% |
| 120 | 2 | < 0.1% |
| Other values (4297) | 4329 |
| Value | Count | Frequency (%) |
| -4287.63 | 1 | |
| -1592.49 | 1 | |
| -1192.2 | 1 | |
| -1165.3 | 1 | |
| -1126 | 1 | |
| -840.76 | 1 | |
| -611.86 | 1 | |
| -451.42 | 1 | |
| -295.09 | 1 | |
| -227.44 | 1 |
| Value | Count | Frequency (%) |
| 279489.02 | 1 | |
| 256438.49 | 1 | |
| 187482.17 | 1 | |
| 132572.62 | 1 | |
| 123725.45 | 1 | |
| 113384.14 | 1 | |
| 88125.38 | 1 | |
| 65892.08 | 1 | |
| 62653.1 | 1 | |
| 59419.34 | 1 |
relative_quantity
Real number (ℝ)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWED| Distinct | 785 |
|---|---|
| Distinct (%) | 18.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 256.3507565 |
| Minimum | -2887 |
|---|---|
| Maximum | 37684 |
| Zeros | 15 |
| Zeros (%) | 0.3% |
| Negative | 47 |
| Negative (%) | 1.1% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | -2887 |
|---|---|
| 5-th percentile | 16 |
| Q1 | 57 |
| median | 110 |
| Q3 | 219 |
| 95-th percentile | 673.9 |
| Maximum | 37684 |
| Range | 40571 |
| Interquartile range (IQR) | 162 |
Descriptive statistics
| Standard deviation | 981.5809106 |
|---|---|
| Coefficient of variation (CV) | 3.829054081 |
| Kurtosis | 595.7323081 |
| Mean | 256.3507565 |
| Median Absolute Deviation (MAD) | 67 |
| Skewness | 20.26712382 |
| Sum | 1118202 |
| Variance | 963501.084 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 28 | 39 | 0.9% |
| 67 | 37 | 0.8% |
| 51 | 36 | 0.8% |
| 72 | 34 | 0.8% |
| 60 | 34 | 0.8% |
| 48 | 32 | 0.7% |
| 46 | 32 | 0.7% |
| 69 | 31 | 0.7% |
| 84 | 31 | 0.7% |
| 75 | 31 | 0.7% |
| Other values (775) | 4025 |
| Value | Count | Frequency (%) |
| -2887 | 1 | |
| -189 | 1 | |
| -157 | 1 | |
| -151 | 1 | |
| -144 | 1 | |
| -98 | 1 | |
| -71 | 1 | |
| -70 | 1 | |
| -69 | 1 | |
| -56 | 2 |
| Value | Count | Frequency (%) |
| 37684 | 1 | |
| 21352 | 1 | |
| 15225 | 1 | |
| 14990 | 1 | |
| 14977 | 1 | |
| 12871 | 1 | |
| 12869 | 1 | |
| 12261 | 1 | |
| 10229 | 1 | |
| 9563 | 1 |
relative_invoices
Real number (ℝ)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 56 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.407611188 |
| Minimum | -6 |
|---|---|
| Maximum | 195 |
| Zeros | 257 |
| Zeros (%) | 5.9% |
| Negative | 82 |
| Negative (%) | 1.9% |
| Memory size | 34.2 KiB |
Quantile statistics
| Minimum | -6 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2 |
| Q3 | 4 |
| 95-th percentile | 11 |
| Maximum | 195 |
| Range | 201 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 6.326680805 |
|---|---|
| Coefficient of variation (CV) | 1.856632244 |
| Kurtosis | 292.1130986 |
| Mean | 3.407611188 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 12.77530276 |
| Sum | 14864 |
| Variance | 40.02689 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 1520 | |
| 2 | 804 | |
| 3 | 480 | 11.0% |
| 4 | 347 | 8.0% |
| 0 | 257 | 5.9% |
| 5 | 215 | 4.9% |
| 6 | 163 | 3.7% |
| 7 | 95 | 2.2% |
| -1 | 72 | 1.7% |
| 8 | 65 | 1.5% |
| Other values (46) | 344 | 7.9% |
| Value | Count | Frequency (%) |
| -6 | 1 | < 0.1% |
| -3 | 1 | < 0.1% |
| -2 | 8 | 0.2% |
| -1 | 72 | 1.7% |
| 0 | 257 | 5.9% |
| 1 | 1520 | |
| 2 | 804 | |
| 3 | 480 | 11.0% |
| 4 | 347 | 8.0% |
| 5 | 215 | 4.9% |
| Value | Count | Frequency (%) |
| 195 | 1 | |
| 154 | 1 | |
| 83 | 1 | |
| 79 | 1 | |
| 76 | 1 | |
| 70 | 1 | |
| 64 | 1 | |
| 58 | 2 | |
| 51 | 1 | |
| 50 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | customer_id | recency_p | quantity_p | quantity_d | avg_ticket | avg_recency_days | avg_basket_size | avg_variety | purchases_pday | gross_revenue | relative_revenue | relative_quantity | relative_invoices | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 17850 | 372.0 | 35.0 | 21.0 | 18.152222 | 124.333333 | 50.970588 | 8.735294 | 17.000000 | 5288.63 | 5288.63 | 14.0 | 33.0 |
| 1 | 1 | 13047 | 31.0 | 132.0 | 6.0 | 18.822907 | 26.642857 | 139.100000 | 17.200000 | 0.029155 | 3079.10 | 3079.10 | 126.0 | 2.0 |
| 2 | 2 | 12583 | 2.0 | 1569.0 | 50.0 | 29.479271 | 20.722222 | 337.333333 | 16.466667 | 0.040323 | 7187.34 | 7187.34 | 1519.0 | 12.0 |
| 3 | 3 | 13748 | 95.0 | 169.0 | -0.0 | 33.866071 | 93.250000 | 87.800000 | 5.600000 | 0.017921 | 948.25 | 948.25 | 169.0 | 5.0 |
| 4 | 4 | 15100 | 333.0 | 48.0 | 22.0 | 292.000000 | 62.166667 | 26.666667 | 1.000000 | 0.073171 | 635.10 | 635.10 | 26.0 | 0.0 |
| 5 | 5 | 15291 | 25.0 | 508.0 | 27.0 | 45.323301 | 21.941176 | 140.200000 | 6.866667 | 0.042980 | 4596.51 | 4596.51 | 481.0 | 10.0 |
| 6 | 6 | 14688 | 7.0 | 579.0 | 281.0 | 17.219786 | 17.761905 | 172.428571 | 15.571429 | 0.057221 | 5107.38 | 5107.38 | 298.0 | 15.0 |
| 7 | 7 | 17809 | 16.0 | 961.0 | 41.0 | 88.719836 | 31.083333 | 171.416667 | 5.083333 | 0.033520 | 4627.62 | 4627.62 | 920.0 | 9.0 |
| 8 | 8 | 15311 | 0.0 | 2167.0 | 231.0 | 25.543464 | 4.098901 | 419.714286 | 26.142857 | 0.243316 | 59419.34 | 59419.34 | 1936.0 | 64.0 |
| 9 | 9 | 14527 | 2.0 | 198.0 | 3.0 | 8.753930 | 5.828125 | 37.981818 | 17.672727 | 0.149457 | 7711.38 | 7711.38 | 195.0 | 24.0 |
Last rows
| df_index | customer_id | recency_p | quantity_p | quantity_d | avg_ticket | avg_recency_days | avg_basket_size | avg_variety | purchases_pday | gross_revenue | relative_revenue | relative_quantity | relative_invoices | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4352 | 5950 | 16000 | 2.0 | 770.0 | -0.0 | 1377.077778 | 2.0 | 1703.333333 | 3.0 | 3.0 | 12393.70 | 12393.70 | 770.0 | 3.0 |
| 4353 | 5951 | 15195 | 2.0 | 1404.0 | -0.0 | 3861.000000 | 2.0 | 1404.000000 | 1.0 | 1.0 | 3861.00 | 3861.00 | 1404.0 | 1.0 |
| 4354 | 5952 | 14087 | 2.0 | 113.0 | 1.0 | 2.817681 | 2.0 | 251.000000 | 69.0 | 1.0 | 181.67 | 181.67 | 112.0 | 0.0 |
| 4355 | 5953 | 14204 | 2.0 | 21.0 | -0.0 | 3.659773 | 2.0 | 82.000000 | 44.0 | 1.0 | 161.03 | 161.03 | 21.0 | 1.0 |
| 4356 | 5954 | 15471 | 2.0 | 102.0 | -0.0 | 6.097143 | 2.0 | 266.000000 | 77.0 | 1.0 | 469.48 | 469.48 | 102.0 | 1.0 |
| 4357 | 5958 | 13436 | 1.0 | 58.0 | -0.0 | 16.407500 | 1.0 | 76.000000 | 12.0 | 1.0 | 196.89 | 196.89 | 58.0 | 1.0 |
| 4358 | 5960 | 15520 | 1.0 | 134.0 | -0.0 | 19.083333 | 1.0 | 314.000000 | 18.0 | 1.0 | 343.50 | 343.50 | 134.0 | 1.0 |
| 4359 | 5962 | 13298 | 1.0 | 96.0 | -0.0 | 180.000000 | 1.0 | 96.000000 | 2.0 | 1.0 | 360.00 | 360.00 | 96.0 | 1.0 |
| 4360 | 5963 | 14569 | 1.0 | 70.0 | -0.0 | 18.949167 | 1.0 | 79.000000 | 12.0 | 1.0 | 227.39 | 227.39 | 70.0 | 1.0 |
| 4361 | 5970 | 12713 | 0.0 | 101.0 | -0.0 | 22.330263 | 0.0 | 508.000000 | 38.0 | 1.0 | 848.55 | 848.55 | 101.0 | 1.0 |